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Understanding the organization of reaction fluxes in cellular 
metabolism from the stoichiometry and the topology of the under- 
lying biochemical network Is a central issue in systems biology. In 
this task, it is important to devise reasonable approximation schemes 
that rely on the stoichiometric data only, because full-scale kinetic 
approaches are computationally affordable only for small networks 
(e.g. red blood cells, about 50 reactions). Methods commonly em- 
ployed are based on finding the stationary flux configurations that 
satisfy mass-balance conditions for metabolites, often coupling them 
to local optimization rules (e.g. maximization of biomass produc- 
tion) to reduce the size of the solution space to a single point. Such 
methods have been widely applied and have proven able to repro- 
duce experimental findings for relatively simple organisms in specific 
conditions. Here we define and study a constraint-based model of 
cellular metabolism where neither mass balance nor flux stationarity 
are postulated, and where the relevant flux configurations optimize 
the global growth of the system. In the case of E. coli, steady flux 
states are recovered as solutions, though mass-balance conditions 
are violated for some metabolites, implying a non-zero net produc- 
tion of the latter. Such solutions furthermore turn out to provide the 
correct statistics of fluxes for the bacterium E. coli in different envi- 
ronments and compare well with the available experimental evidence 
on individual fluxes. Conserved metabolic pools play a key role in 
determining growth rate and flux variability. Finally, we are able to 
connect phenomenological gene essentiality with 'frozen' fluxes (i.e. 
fluxes with smaller allowed variability) in E. coli metabolism. 
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Cellular metabolism involves a complex network of inter- 
actions and cross-regulations between metabolites, pro- 
teins and genes. While our knowledge of regulatory functions 
at the genetic level and at the level of protein-protein interac- 
tions is still in its infancy, the biochemical network of reactions 
that describes metabolism has been characterized in detail for 
many organisms [1-3] . Most information about the metabolic 
network is contained in the matrices B = {6f } and A = {af } 
describing, respectively, the input and output stoichiometric 
coefficients of each metabolite ^ (ranging from 1 to P) for 
all the metabolic reactions i (1 to A*'). Their knowledge al- 
lows for the definition of constraint-based models from which 
a prediction of metabolic fluxes is possible [4-6]. These mod- 
els typically rely on a steady-state assumption that reflects 
the timescale separation between the (faster) equilibration of 
metabolic processes and the (slower) dynamics of genetic regu- 
lation. Under this condition, the concentration of metabo- 
lite ^ and the flux fi > of reaction i are constant in time 
and globally linked by a set of mass-balance conditions: 

N 

d 

— =^(ar-6r)^, = , Vm=1,...,P [1] 

i=l 

or, in matrix notation, (A — B)!^ = 0, where v = {i/i, . . . , I'm} 



is a flux vector. The problem is that of finding a flux vector 
satisfying the set of P linear equations ((TJ . For real metabolic 
networks, the above system is usually under-determined as 
N > P (e.g. E. Coli has around 1100 reactions and less than 
800 metabolites [1]), so that multiple solutions exist and one 
has to specify which of these are the working, physical states 
of the network. 

In organisms with high functional specificity or whose 
main objective is to produce certain specified metabolites, 
like e.g. red blood cells, physical states are taken to be all 
those consistent with the mass-balance conditions. In such 
cases, it is important to be able to explore the solution space 
of ([T| uniformly. Uniform sampling can be achieved in small 
networks with Monte Carlo methods [7] and, more recently, 
message-passing algorithms have been employed [8,9]. How- 
ever computational considerations still prevent the application 
of such techniques to explore more general aspects of larger 
metabolic networks. 

For more complex organisms, one normally complements 
([T]) with the assumption that the physical state of the net- 
work obeys a specific optimization principle. In flux-balance 
analysis (FBA, [5]) the choice usually falls on the maximiza- 
tion of biomass production, a useful proxy of the growth ca- 
pabilities of an organism. FBA has been widely applied to 
different organisms (mostly bacteria) to investigate general 
aspects of metabolism, like flux distributions in different en- 
vironments [10], evolutionary dispensability of enzymes [11], 
or the plasticity of the reaction network [12]. Minimization 
of metabolic adjustment (MOMA, [6]) is instead able to pre- 
dict the re-organization of fluxes after a reaction knockout by 
minimizing the distance between the FBA solution and that 
obtained from ([T| after the knockout. 

Here we consider a constraint-based model of metabolic 
activity with the aim of characterizing flux states correspond- 
ing to optimal net metabolic production. We neither assume 
constancy-in-time of fluxes, nor do we impose mass-balance 
constraints on metabolites. Rather, we allow for these to 
be recovered as properties of the solutions. This assump- 
tion turns out to be able to reproduce the empirical statistics 
of fluxes for the bacterium E. coli in different environments 
with a remarkable accuracy, and the physical and biochemical 
origin of the robustness of the emerging picture can be com- 
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pletely understood. Further biological insight can be gained 
by comparing the variability of individual fluxes with data on 
phenomenological gene essentiality. 

Model definitions 

The abstract setup we consider was originally introduced by 
J. Von Neumann to model growth in production economies 
as an autocatalytic process [13]. We consider a system of A'' 
chemical reactions between P reagents, with fluxes and con- 
centrations evolving in discrete time f = 0, 1, . . . (we leave the 
timescale unspecified). For now, the system is assumed to 
be purely autocatalytic, so that the total input of any given 
metabolite at a certain time step must come from (all of or 
part of) the output at the previous time step. Let Si(t) 
denote the scale at which reaction i operates at time t, so 
that the total input and output of metabolite ^ are given 
by /^(t) = E> '5>(t)&f and O^t) = Siit)a'^, respectively. 
Stability requires that ^(t) := 0''(t) - /"(t + 1) > at aU 
times, otherwise the system would need an outside metabolic 
source to survive. We focus our attention on the feasibility 
of dynamical paths with constant growth rate p > 0, where 
/''(t + 1) — p I'^{t). It is easy to see that, in such paths, reac- 
tion rates must behave as Si{t) = Sip^, with constants Si > 
satisfying the P linear constraints 

JV 

c'* = ^«-p6ns. >0 , \/^^=l,...,P . [2] 

i=l 

The main assumption now is that the network's physical state 
s* — {s*} corresponds to one of those for which the growth 
rate p attains its maximum possible value p* compatible with 
the constraints The rationale for this choice lies in the fact 
that, under general conditions, it can be proven that paths 
with the optimal use of resources coincide, apart from an ini- 
tial transient, with those of maximal expansion [14]. Note 
that the Si's are essentially the discrete-time version of the 
continuous time fluxes Vi of Eq. ([1} . 

It is intuitive that the solution space of @ shrinks as p 
is increased starting from p = 0, where any flux configura- 
tion is a viable state. Quite importantly, then, for a fixed set 
of input-output relations, the solutions of are bound to 
coincide with those of ([T| if p* = 1 and c'' = V^. 

The typical behaviour of Von Neumann's model can be 
fully appreciated analytically in the case of random graphs. 
Depending on their size and topology, autocatalytic networks 
with random stoichiometry give rise to very different optimal 
states. As in other constrained optimization problems [15], 
the key control parameter is the ratio N/P — n oi variables- 
to-constraints. In random topologies [16,17] as n increases the 
system crosses over from a contracting phase with p* < 1 to 
an expanding one with p* > 1, passing through a stationary 
regime with p* = 1 in which reaction fluxes are constant in 
time as in FBA (though the values at which fluxes settle may 
be different). 

Application to E. coli 

A natural reaction network can be modeled as an autocat- 
alytic system when the uptake reactions that provide re- 
sources and account for exchanges of metabolites with the 
surrounding environment are included. We have applied the 
Von Neumann scheme to the cellular metabolism of the bac- 



terium E. coli, as reconstructed in [1]. To set the stage, three 
different operations have to be performed. 

(a) Environment selection. To flx the environment where 
the cell lives, we have deflned a set of external metabolites on 
which we applied uptake fluxes. We have considered three dis- 
tinct environmental conditions: (i) isolated cell, without up- 
takes; (ii) minimal environment, with uptakes on a restricted 
set of metabolites [10], namely CO2, K, NH4, PI, O2, SO4 and 
either one of glu-L, succ or glc); (iii) rich environment, with 
uptakes on all external metabolites. 

(b) "Leaf-removal". Once the network is built, one has to 
remove from the internal metabolites those that are never pro- 
duced, since the corresponding constraints are only satisfied 
by taking p = or by applying a null flux to the corresponding 
reactions (as is actually done in FBA). 

(c) Accounting for reversibility. We have disposed of re- 
versible reactions by introducing two separate fluxes and tak- 
ing the absolute difference as the positive net flux. 

The size of the resulting network, i.e. A'^ and P, is ulti- 
mately different for different environments. 

We calculate the maximum growth rate p* numerically 
via a generalized MinOver algorithm [18], detailed in [17], in 
which solutions are found at fixed p and then p is gradually in- 
creased. It has been shown rigorously [17] that this algorithm 
converges to a solution at a fixed p if at least one solution ex- 
ists. Moreover, when multiple solutions occur (in which case 
they form a connected set), the algorithm provides a uniform 
sampling of the solution space [19]. Anticipating that indeed 
many fiux vectors satisfy ((2)1 at p* for E. coli, the latter is a 
particularly important advantage since a uniform sampling is 
required to characterize the solution space. 

Characterization of the solutions 

For the metabolic network of E. coli, we have found p* = 
0.999 ± O.QQl independently of the environmental conditions 
we have tested, so that the state of optimal growth is com- 
patible with one with constant fiuxes. The distribution of 
reaction fluxes at p* (see Fig. [1} displays a regime (less than 
two decades) with a scaling form P{s) ~ s~~' with exponent 
close to —1 (in reasonable agreement with the experimental 
evidence based on a limited set of measured fluxes [10,20]) fol- 
lowed by a cutoff. Note that the flux histogram is remarkably 
stable over different solutions. This scenario has been partially 
reproduced by FBA [10] , but the solution obtained optimizing 
the biomass production systematically overestimates 7. 

In order to compare individual fluxes with experiments, we 
aimed at studying the solution obtained in conditions similar 
to those described in [20], focusing our attention on 17 fluxes 
from the central metabolism, as in [6]. Unfortunately, we are 
unable to reproduce in detail the M9 medium used in [20] , and 
can only fix the uptakes of three metabolites, namely glucose, 
CO2 and O2 identically to [20]. For the remaining part of the 
environment, we chose to simulate a minimal medium and 
fixed the four remaining external uptakes at arbitrary values 
(we did not observe a strong dependency of the results on 
these parameters). Results are shown in Fig. [2] We stress 
that the medium we consider is not strictly identical to that 
used in experiments. It is worth noting (see also [21]) that 
experiments employ ^^C-labeled carbon sources as substrates 
for a growing bacterial culture kept at constant density, and 
that the relative abundance of metabolites can be captured 
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by nuclear-magnetic-resonance or mass spectroscopy. Reac- 
tion fluxes of difi'erent metabolic pathways are then inferred 
assuming a model of the reaction network. 

With stationarity recovered at p* — l, as discussed above 
the difference between the Von Neumann solution and that of 
FBA arises from the fact that not all c^'s attain their lowest 
allowed value. This is clearly seen in Fig. |3] In each solution, 
some metabolites exist with a non-zero c'', implying that in 
the steady state a net production of such metabolites occurs. 
This may also signal an incompleteness of the stoichiometric 
data. 

To get further insight on the existence of multiple flux 
states compatible with ([2| at p* and on the shape of the so- 
lution space, a rough but effective way consists in monitoring 
the mean overlap between different solutions. Given two so- 
lutions a and /3 at flxed p, we define their overlap qa/s as: 

This quantity equals one when and coincide and be- 
comes smaller and smaller as the distance between and 
in the space of fiux vectors increase^. In Fig. |4] we report 
the behaviour of the mean overlap (qa/j) (the angular brack- 
ets representing an average over many pairs of solutions) for 
E. coli and for a network with the same topology and N/P 
but random Gaussian stoichiometry. Random networks pro- 
vide an important benchmark against which the behaviour of 
real networks can be tested, in order to highlight the extent 
to which observations are specific of the particular organism 
that is being analyzed. In particular, the role of topology and 
stoichiometry can be fully analyzed. For instance, in [17] it is 
shown that the power-law behaviour of the flux distribution 
cannot be ascribed to the specific network topology. 

One sees that for the artificial metabolic network (ga/3) 
is an increasing function of p that approaches 1 as p —* p* 
(Fig. IDs). Moreover the overlap histogram has a marked 5- 
peak at g = 1 whose mass increases aa p p* (data not 
shown). These results indicate that the optimal solution is 
unique. On the other hand, the behaviour of {qoip) for E. colt 
suggests that the volume of solutions stops contracting when 
p reaches roughly 0.8 from below. In particular, at p* multiple 
solutions survive. From the corresponding histogram. Fig. [4]:, 
we see that only about the 30% of reactions have an overlap 
close to 1. In a different jargon, one may say that roughly 
30% of the variables are frozen (i.e. assume the same value on 
all solutions of the constrained optimization problem), while 
the remaining are free. The existence of frozen variables char- 
acterizes many random constraint satisfaction problems [22], 
but it is normally hard to identify topological motifs where 
variables are more likely to be frozen. In the present case, it 
is reasonable to expect that for purely structural reasons re- 
action chains are entirely frozen when the first reaction of the 
chain is. This is indeed confirmed by the map of frozen/free 
fluxes in E. colt's central metabolism, displayed in Fig. O The 
chain-like part of glycolisis appears indeed to be frozen. 

The existence of frozen fluxes raises the obvious question 
of their biological signiflcance. To address this point, we have 
correlated reaction overlaps g^'^ with the essentiality of the 
corresponding genes according to the notion of universal es- 
sentiality used in [23], which combines phenomenological rel- 
evance (a gene is essential if knocking it out the cell dies) 



with evolutionary retention (the presence of the gene in dif- 
ferent species). In [23], 55 essential genes of E. coli involved in 
metabolism have been identifled, that are also present in 80% 
of 32 different bacterial genomes. We have been able to link 
52 of such genes to reactions in the reconstructed network. 
It turns out, see Fig. [S] that 43 of such genes correspond to 
reactions with overlap larger than 0.8 and that only 7 genes 
relate to reactions with an overlap significantly smaller than 
809(0. This suggests that frozen fiuxes, which in Von Neu- 
mann's framework are allowed a very limited variability if a 
state of optimal growth is to be kept, may carry an evolution- 
ary significance. 

The role of conserved moieties 

To trace back the physical origin of the results presented 
above, we have studied the rank of the matrix A — pB as- 
sociated to the P linear constraints @ as a function of the 
parameter p, see Fig. [T] The singularity occurring at p = 1 
is related to the presence of conserved pools of metabolites, 
groups of reagents whose total concentration is constant in 
time [24]. Their existence is due to the fact that the con- 
centrations of metabolites of a certain pool are always cou- 
pled with a common functional group. For example the three 
metabolites ATP, AMP and ADP belong to a pool of cofac- 
tors that preserve the adenylate moiety [25] . This implies that 
a change in the concentration of a given metabolite can not 
be accomplished without considering the entire pool to which 
that metabolite belongs. 

A conserved pool g is formally defined by a P-dimensional 
Boolean vector of elements Zg such that Zg — 1 if p £ g and 
Zg — if p ^ g satisfying 

f^4'(af-6r) = , Vi = l,...,iV. [4] 

Such conservation laws manifest themselves precisely at p = 1, 
By virtue of ((2)1, the relation '^i^ZgC'^ > must hold for any 
g. It follows that either p* < 1, or all fiuxes connected to 
a metabolite belonging to a conserved pool must be equal to 
zero. Since the null solution s = must be discarded on obvi- 
ous physical grounds, if all fiuxes are connected to metabolites 
in a conserved pool then necessarily p* < 1. This is consis- 
tent with the results obtained in different environmental con- 
ditions and implies that this scenario must be stable against 
small perturbations of the network topology. 

Moreover, from (|4]) and ((2]), it is easy to see that, for p = 1, 
p £ g implies = 0, i.e. for a metabolite belonging to a con- 
served pool the mass-balance condition must be strictly valid. 
We have shown instead (Fig. [3} that the values of the con- 
straints are not always zero: for some metabolites, the mass 
balance condition is not reached and we can assert that they 



In computing it, one should consider a flux to be zero whenever its value is belovw a threshold e, 
to take into account the fact that there is a loss of information about relative fluctuations between 
different solutions in fluxes smaller than e. We have chosen e — 10~^ to ensure that overlaps 
are not overestimated. Results obtained with e — 10^^, 10^*^ and 10^^ are identical. We 
have furthermore taken into account the fact that the overlap between null fluxes must be defined 
by consistency to be equal to one. 

^Whether a gene is 'essential' or not (according to the definition of [23]) depends on the choice 
of the environment. In particular, [23] considers a rich medium as the environment. For compar- 
ison, we have considered theoretical values of the overlaps for fluxes calculated assuming a rich 
environment. Note that, in principle, the same gene can be linked to a more or less variable flux 
in different environments. However, the results presented are qualitatively preserved in the minimal 
environment with different carbon sources. 
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do not belong to any conserved pool (within the stoichiomet- 
ric description employed). We conclude that in the state of 
optimal growth with p* = 1, in addition to the stationarity of 
reaction rates, a spontaneous condition of mass balance holds 
for most, not all, metabolites. 

In summary, the Von Neumann model relies on the as- 
sumption that the arrangement of metabolic fluxes follows a 
principle of growth maximization. It does not imply a pri- 
on cither the mass balance or the stationarity of fluxes, but 
these two conditions are essentially recovered at the maxi- 
mum growth rate. This is due to the presence of conserved 
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Fig. 1. Flux distributions at p* for E. coli in a minimal environment with three different carbon sources: glutamate (glu), succinate (sue) and glucose (glc). Here, 
= 1053 and P = 630 (glu); = 1054 and P = 630 (sue); TV = 1057 and P = 631 (glc). Inset: the same distributions plotted after a binning of the abscissa. 
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Fig. 2. Comparison of reaction fluxes predicted by the Von Neumann scheme with 17 fluxes measured in [20] and analyzed in [6]. The different reactions (in no specific order) 
are reported on the horizontal axis (labeled as in [6]), their corresponding fluxes (relative to the glucose uptake) on the vertical axis. Red markers denote experimental values, 
blue ones theoretical predictions (see however text for details on the environmental conditions considered). Note that many flux vectors satisfy Von Neumann's conditions at 
optimal growth. 300 different solutions are reported here. 
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Fig. 3. Values of for different metabolites (horizontal axis) in 500 different solutions. The colour code is reported on the right. The white background corresponds to 
= 0. Colored marks indicate a net production of the corresponding metabolite in that solution, or a mass balance violation. One sees that while mass balance holds for 
most metabolites, some are consistently unbalanced while others may or may not be, depending on the solution. 




0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

q q 

Fig. 4. (a,b) Mean overlap between 500 different solutions as function of p in E. coli and in a random metabolic network; the last point on the abscissa corresponds to 
the value of p* estimated by simulation, (c.d) Overlap histogram P{q) at p* in E.coli and in the random network. Note the different scales of the y axes in the lower panels. 
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Fig. 5. E. coil's central metabolism: nodes represent metabolites, an arrow joining two nodes is present when a reaction exists converting one into the other. Red (resp. 
green) links denote frozen (resp. free) reactions, with overlap larger (resp. smaller) than 0.9. 
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Fig. 7. The rank of the matrix associated to the system of inequalities |(2) for E. Colt (isolated cell) as a function of p. Around the singularity the rank equals the number 
of metabolites; conserved pools of metabolites are only present exactly at p = 1. 
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